{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Use MongoDBQueryEngine to query Markdown files \n",
    "\n",
    "This notebook demonstrates the use of the `MongoDBQueryEngine` for retrieval-augmented question answering over documents. It shows how to set up the engine with Docling parsed Markdown files, and execute natural language queries against the indexed data. \n",
    "\n",
    "The `MongoDBQueryEngine` integrates cloud MongoDB Atlas vector storage with LlamaIndex for efficient document retrieval."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install llama-index==0.12.16\n",
    "%pip install llama-index-vector-stores-mongodb==0.6.0\n",
    "%pip install llama-index-embeddings-huggingface==0.5.2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Set up Open AI key for query engine retrieval"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "import autogen\n",
    "\n",
    "config_list = autogen.config_list_from_json(env_or_file=\"../OAI_CONFIG_LIST\")\n",
    "\n",
    "assert len(config_list) > 0\n",
    "print(\"models to use: \", [config_list[i][\"model\"] for i in range(len(config_list))])\n",
    "\n",
    "# Put the OpenAI API key into the environment\n",
    "os.environ[\"OPENAI_API_KEY\"] = config_list[0][\"api_key\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Set up Mongo DB instance\n",
    "\n",
    "To use this notebook, you will need to have a MongoDB Atlas environment.\n",
    "For this notebook, we use a docker instance. please refer [MongoDB: Create a Local Atlas Deployment with Docker\n",
    "](https://www.mongodb.com/docs/atlas/cli/current/atlas-cli-deploy-docker/) for more info. \n",
    "\n",
    "Some info that you need to get include:\n",
    "- MongoDB Connection String (URI)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from autogen.agentchat.contrib.rag.mongodb_query_engine import MongoDBQueryEngine\n",
    "\n",
    "query_engine = MongoDBQueryEngine(\n",
    "    connection_string=\"mongodb://localhost:27017/?directConnection=true\",\n",
    "    database_name=\"vector_db\",\n",
    "    collection_name=\"test_collection\",\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here we can see the default collection name in the vector store, this is where all documents will be ingested. When creating the `MongoDBQueryEngine` you can specify a `collection_name` and `database_name` to ingest into."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(query_engine.get_collection_name())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Initialize DB and ingest documents\n",
    "\n",
    "Let's ingest a document and query it.  \n",
    " ```init_db``` will overwrite the existing collection with the same name."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "input_dir = \"../test/agents/experimental/document_agent/pdf_parsed/\"  # Update to match your input directory"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "input_docs = [input_dir + \"nvidia_10k_2024.md\"]  # Update to match your input documents\n",
    "query_engine.init_db(new_doc_paths_or_urls=input_docs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If the given collection already has the document you need, you can use ```connect_db``` to avoid overwriting the existing collection."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "query_engine.connect_db()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "question = \"What is Nvidia's revenue in 2024?\"\n",
    "answer = query_engine.query(question)\n",
    "print(answer)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Great, we got the data we needed. Now, let's add another document."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "new_docs = [input_dir + \"Toast_financial_report.md\"]\n",
    "query_engine.add_docs(new_doc_paths_or_urls=new_docs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And query again from the same database but this time for another corporate entity."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "question = \"What is the trading symbol for Toast\"\n",
    "answer = query_engine.query(question)\n",
    "print(answer)"
   ]
  }
 ],
 "metadata": {
  "front_matter": {
   "description": "Mongo DB Query Engine",
   "tags": [
    "agents",
    "documents",
    "RAG",
    "docagent",
    "mongodb",
    "query"
   ]
  },
  "kernelspec": {
   "display_name": "base",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
